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1 Runtime identification of cache conflict misses: The adaptive miss buffer 
Jamison D. Collins, Dean M. Tullsen 

November 2001 ACM Transactions on Computer Systems (TOCS), volume 19 issue 4 
Full text available: ^ pdf(1.08 MB) Additional Information: full citation , abstract , references , index terms 

This paper describes the miss classification table, a simple mechanism that enables the 
processor or memory controller to identify each cache miss as either a conflict miss or a 
capacity (non-conflict) miss. The miss classification table works by storing part of the tag of 
the most recently evicted line of a cache set. If the next miss to that cache set has a 
matching tag, it is identified as a conflict miss. This technique correctly identifies 88&percnt; 
of misses. Several applications of this i ... 

Keywords: Cache architecture, adaptive miss buffer, cache exclusion, conflict misses, 
prefetching, victim cache 



2 Ha rdware identificati on o f cache confli ct misses 
Jamison D. Collins, Dean M. Tullsen 

November 1999 Proceedings of the 32nd annual ACM/IEEE international symposium on 
M icroa rc h itectu re 

Full text available:^ _ i£ , A „ B m\\ Additional Information: full citation, abstract, references, citings, index 

terms 
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This paper describes the Miss Classification Table, a simple mechanism that enables the 
processor or memory controller to identify each cache miss as either a conflict miss or a 
capacity (non-conflict) miss. The miss classification table works by storing part of the tag of 
the most recently evicted line of a cache set. If the next miss to that cache set has a 
matching tag, it is identified as a conflict miss. This technique correctly identifies 87% of 
misses in the worst case. 

3 Eliminating conflict misses for high performance architectures 
Gabriel Rivera, Chau-Wen Tseng 

July 1998 Proceedings of the 12th international conference on Supercomputing 

Full text available: ^] pd f (1 .04 MB) Additional Information: fu ll citation , references , citin gs, i ndex terms 
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Gabriel Rivera, Chau-Wen Tseng 

May 1998 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 1998 conference 
on Programming language design and implementation, volume 33 issue 5 

Full text available* fiB DdfM 62 MB) Additional Information: full citat ion, abstract, refe rences , citings, index 
. terms 

Many cache misses in scientific programs are due to conflicts caused by limited set 
associativity. We examine two compile-time data-layout transformations for eliminating 
conflict misses, concentrating on misses occuring on every loop iteration. Inter-variable 
padding adjusts variable base addresses, while intra-variable padding modifies array 
dimension sizes. Two levels of precision are evaluated. PAD LITE only uses array and column 
dimension sizes, relying on assumptions about common array refe ... 

Avoiding conflict misses dynamically in lar ge direct-mapp ed caches 

Brian N. Bershad, Dennis Lee, Theodore H. Romer, J. Bradley Chen 

November 1994 Proceedings of the sixth international conference on Architectural 

support for programming languages and operating systems, volume 29 , 

28 Issue 11 , 5 

Full text available- pdf(1.37 MB) Additional Information; full citation, abstract , references , citings, index 
^ terms 

This paper describes a method for improving the performance of a large direct-mapped 
cache by reducing the number of conflict misses. Our solution consists of two components: 
an inexpensive hardware device called a Cache Miss Lookaside (CML) buffer that detects 
conflicts by recording and summarizing a history of cache misses, and a software policy 
within the operating system's virtual memory system that removes conflicts by dynamically 
remapping pages whenever large numbers of conflict miss ... 

Eliminating cache conflict misses through XOR-based placement functions 

Antonio Gonzalez, Mateo Valero, Nigel Topham, Joan M. Parcerisa 

July 1997 Proceedings of the 11th international conference on Supercomputing 

Full text available: ffipdf(1.21 MB) Additional Information: full citation , references , citings , index terms 



Keywords: XOR-based placement functions, cache memory, conflict misses 



Cache: Reducin g traffic gener ated by con fl ict mi sses i n c a c he s 
Pepijn J. de Langen, Ben Juurlink 

April 2004 Proceedings of the 1st conference on Computing frontiers 

Full text available: ^ pdf(246.59 KB) Additional Information: MLcjtation, abstract, references, citings, index 

Off-chip memory accesses are a major source of power consumption in embedded 
processors. In order to reduce the amount of traffic between the processor and the off-chip 
memory as well as to hide the memory latency, nearly all embedded processors have a 
cache on the same die as the processor core. Because small caches dissipate less power and 
are cheaper than large caches, a small cache is preferable to a large cache. Furthermore, 
because set-associative caches consume more power than direct-mapp ... 

Keywords: caches, conflict misses, embedded processors, power reduction 



A permutation-based pag e interleaving scheme to reduce row-buffer conflicts and 
exploit data locality 

Zhao Zhang, Zhichun Zhu, Xiaodong Zhang 
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December 2000 Proceedings of the 33rd annual ACM/IEEE international symposium on 
Microarchitecture 

Full text available: f£| pdf(1 53.06 KB) 

j3s( 856.21 KB) Additional Information: full citation , references , citings , index terms 
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Reducing cache misses usin g hardware and sof t ware page placement 
Timothy Sherwood, Brad Calder, Joel Emer 

May 1999 Proceedings of the 13th international conference on Supercomputing 

Full text available: fgl pdfn.50 MB ) Additional Information: full citation , references , citings, index terms 



1 0 The design and performance of a conflict-avoiding cache 
Nigel Topham, Antonio Gonzalez, Jose Gonzalez 

December 1997 Proceedings of the 30th annual ACM/IEEE international symposium on 
Microarchitecture 

Full text available: ^ MBjJff Additional Information: full citation , abstract , references , citings , index 

Publisher Site tenns 

High performance architectures depend heavily on efficient multi-level memory hierarchies 
to minimize the cost of accessing data. This dependence will increase with the expected 
increases in relative distance to main memory. There have been a number of published 
proposals for cache conflict-avoidance schemes. We investigate the design and performance 
of conflict-avoiding cache architectures based on polynomial modulus functions, which 
earlier research has shown to be highly effective at reducing ... 

Keywords: cache architecture design, cache storage, conflict miss ratios, conflict-avoiding 
cache performance, data access cost minimization, high performance architectures, main 
memory, multi-level memory hierarchies, polynomial modulus functions 



1 1 Precise miss analysis for pro gram tra nsformations with caches of arbitrary associativit y 
Somnath Ghosh, Margaret Martonosi, Sharad Malik 

October 1998 Proceedings of the eighth international conference on Architectural 

support for programming languages and operating systems, volume 32 , 33 

Issue 5 , 11 

Full text available* 151 pdf(1 67 MB) Additional Information: full citation , abstract , references , citings, index 
' ^ terms 

Analyzing and optimizing program memory performance is a pressing problem in high- 
performance computer architectures. Currently, software solutions addressing the 
processor-memory performance gap include compiler-or programmer-applied optimizations 
like data structure padding, matrix blocking, and other program transformations. Compiler 
optimization can be effective, but the lack of precise analysis and optimization frameworks 
makes it impossible to confidently make optimal, rather than h ... 

12 Tradin g conflict and capacity aliasin g in conditional branch predictors 
Pierre Michaud, Andre Seznec, Richard Uhlig 

May 1997 ACM SIGARCH Computer Architecture News , Proceedings of the 24th 

annual international symposium on Computer architecture, volume 25 issue 2 
Full text available: fij pdf (1.60 MB ) Additional Information: full citation , abst ract , references, citings, i ndex 
Lj terms 

As modern microprocessors employ deeper pipelines and issue multiple instructions per 
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cycle, they are becoming increasingly dependent on accurate branch prediction. Because 
hardware resources for branch-predictor tables are invariably limited, it is not possible to 
hold all relevant branch history for all active branches at the same time, especially for large 
workloads consisting of multiple processes and operating-system code. The problem that 
results, commonly referred to as aliasing in the br ... 

Keywords: 3 C's classification, aliasing, branch prediction, skewed branch predictor 



13 Com parin g data forwardin g and prefetchin g for communication-induced misses in 
shared-memory MPs 
David Koufaty, Josep Torrellas 

July 1998 Proceedings of the 12th international conference on Supercomputing 

Full text available: ^ pdf(1.12 MB) Additional Information: full citation , references , citings , index terms 



14 Code placement techniques for cache miss rate reduction 
Hiroyuki Tomiyama, Hiroto Yasuura 

October 1997 ACM Transactions on Design Automation of Electronic Systems 

(TODAES), Volume 2 Issue 4 
Full text available- R pdf(288 29 KB) Additional Information: full citation, abstract, references, citings, index 



terms 

In the design of embedded systems with cache memories, it is important to minimize the 
cache miss rates to reduce power consumption of the systems as well as improve the 
performance. In this article, we propose two code placement methods ( a simplified method 
and a refined one) to reduce miss rates of instruction caches. We first define a simplified 
code placement problem without an attempt to minimize the code size. The problem is 
formulated as an integer linear programming (ILP) problem, ... 

Keywords: code placement, instruction cache, integer linear programming 



15 Instruction prefetching of systems codes with layout optimized for reduced cache 
mis ses 

Chun Xia, Josep Torrellas 

May 1996 ACM SIGARCH Computer Architecture News , Proceedings of the 23rd 

annual international symposium on Computer architecture, volume 24 issue 2 
Full text available: f£l pdf(1.65 MB) Additional Information: full citation , abstract , references , citings, index 

terms 

High-performing on-chip instruction caches are crucial to keep fast processors busy. 
Unfortunately, while on-chip caches are usually successful at intercepting instruction fetches 
in loop-intensive engineering codes, they are less able to do so in large systems codes. To 
improve the performance of the latter codes, the compiler can be used to lay out the code in 
memory for reduced cache conflicts. Interestingly, such an operation leaves the code in a 
state that can be exploited by a new type of ... 

1 6 Reducing cache conflicts in data cache prefetching 
Jin-Ho Lee, Min-Young Lee, Seong-Uk Choi, Myong-Soon Park 

September 1994 ACM SIGARCH Computer Architecture News, volume 22 issue 4 

Full text available: fjj]) pdf(418.71 KB) Additional Information: full citation , citings , index terms 
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17 Missin g the memory wall: the case for processor/memory inte gration 
Ashley Saulsbury, Fong Pong, Andreas Nowatzyk 

May 1996 ACM SIGARCH Computer Architecture News , Proceedings of the 23rd 

annual international symposium on Computer architecture, volume 24 issue 2 

Full text available- f£\ pdfd 45 MB) Addition a' Information: full citation , abstract , references , citings , index 
" iSH 4 -^ terms 

Current high performance computer systems use complex, large superscalar CPUs that 
interface to the main memory through a hierarchy of caches and interconnect systems. 
These CPU-centric designs invest a lot of power and chip area to bridge the widening gap 
between CPU and main memory speeds. Yet, many large applications do not operate well on 
these systems and are limited by the memory subsystem performance.This paper argues for 
an integrated system approach that uses less-powerful CPUs that are ... 



1 8 Column-associative caches: a technique for reducing the miss rate of direct-ma pped j§jjj 
caches 

Anant Agarwal, Stephen D. Pudar 

May 1993 ACM SIGARCH Computer Architecture News , Proceedings of the 20th 

annual international symposium on Computer architecture, volume 21 issue 2 
Full text available: ^] pdf ( 1.17 MB ) Additional Information: full citation , references , citings, index terms 



1 9 Cache miss eq uatio ns: an an al ytical representation of cache misse s 
Somnath Ghosh, Margaret Martonosi, Sharad Malik 

July 1997 Proceedings of the 11th international conference on Supercomputing 

Full text available: *g)pdf( 1.98 MB ) Additional Information: full citation , references , citin gs, index terms 



20 Memory optimization for embedded systems: Improved indexing for cache miss 
reduction in embedded systems 
Tony Givargis 

June 2003 Proceedings of the 40th conference on Design automation 

Full text available: ^g] pdf( 215.59 KB) Additional Information: full c i tation , abstract , references, index terms 

The increasing use of microprocessor cores in embedded systems as well as mobile and 
portable devices creates an opportunity for customizing the cache subsystem for improved 
performance. In traditional cache design, the index portion of the memory address bus 
consists of the K least significant bits, where K=log2(D) and D is the depth of the cache. 
However, in devices where the application set is known and characterized (e.g., systems 
that execute a fixed application set) there is an opportunity ... 

Keywords: cache optimization, design space exploration, index hashing 
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